Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Common bean (Phaseolus vulgaris), a staple food in Latin America and Africa, serves as a vital source of energy, protein, and essential minerals for millions of people. However, genomics knowledge that breeders could leverage for improvement of this crop is scarce. We have developed and validated a comparative genomics approach to predict conserved transcription factor binding sites (TFBS) in common bean and studied gene regulatory networks. We analyzed promoter regions and identified TFBS for 12,631 bean genes with an average of 6 conserved motifs per gene. Moreover, we discovered a statistically significant relationship between the number of conserved motifs and amount of available experimental evidence of gene regulation. Notably, ERF, MYB, and bHLH transcription factor families dominated conserved motifs, with implications for starch biosynthesis regulation. Furthermore, we provide gene regulatory data as a resource that can be interrogated for the regulatory landscape of any set of genes. Our results underscore the significance of TFBS conservation in legumes and aligns with the notion that core genes often exhibit a more conserved regulatory makeup. The study demonstrates the effectiveness of a comparative genomics approach for addressing genome information gaps in non-model organisms and provides valuable insights into the regulatory networks governing starch biosynthesis genes that can support crop improvement programs.more » « lessFree, publicly-accessible full text available December 1, 2026
-
Abstract Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.more » « less
-
Abstract Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.more » « less
An official website of the United States government
